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Abstract: 

This paper describes, in detail, techniques for measuring the Hurst parameter. Measurements are given on ar- 
tificial data both in a raw form and corrupted in various ways to check the robustness of the tools in question. 
Measurements are also given on real data, both new data sets and well-studied data sets. All data and tools used 
are freely available for download along with simple "recipes " which any researcher can follow to replicate these 
measurements. 



Keywords: long-range dependence, data analysis, Hurst parameter, Internet traffic 



1 INTRODUCTION AND BACKGROUND 



Long-Range Dependence (LRD) is a statistical 
phenomenon which has received much attention 
in the field of telecommunications in the last ten 
years. A time-series is described as possessing 
LRD if it has correlations which persist over all 
time scales. A good guide to LRD is given by 
IBeraijl Il994ll and a summary in t he con t ext of 
telecommunications is given by IClegel [2004, 
chapter one] (from which some of the material in 
this paper is taken). In the early nineties, LRD 
was measured in time-series derived from Interne t 
traffic I Lelan d~Taqqu. Willinger. and Wilsoni n"993l. 
The importance of this is that LRD can impact 
heavily on queuing. LRD is characterised by the 
parameter H, the Hurst parameter, (named for a 
hvdrol o gist w ho pioneered the field in the fifties 
iHursd 1 195 ill ') where H G (1/2,1) indicates the 
presence of LRD. There are a number of differ- 
ent statistics which can be used to estimate the 
Hurst parameter and several papers have been 
written comparing these estimators both in theory 
and practic e | Taaau. Teverovskv. and Willinger, 
iTaaau andTeverovskvl 1 1997 



Bard et. Lang. O ppenheim. Phillip e. Stoev. and T aqqu. 



2003all . The aim of this paper is not to make a rig- 
orous comparison of the estimators but, instead, 
to present a simple and readable guide to what a 
researcher can expect from attempting to assess 



whether LRD is absent or present in a data set. 
All the tools used are available online using free 
software. Software can be downloaded from: 

www . richardclegg .org/ 
lrdsources/ software/ 

1.1 Long-Range Dependence in Telecommunica- 
tions 

In their classic paper, Le land et al 

iLeland. Taaau. Willinger. and Wilsonl Il993ll 
measure traffic past a point on an Ethernet Local 
Area Network. They conclude that "In the case of 
Ethernet LAN traffic, self-similarity is manifested 
in the absence of a natural length of a 'burst'; at 
every time scale ranging from a few milliseconds 
to minutes and hours, bursts consist of bursty 
sub-periods separated by less burst sub-periods. 
We also show that the degree of self-similarity 
(defined via the Hurst parameter) typically depends 
on the utilisation level of the Ethernet and can 
be used to measure 'burstiness' of LAN traffic." 
Since then, a number of authors have replicated 
these experiments on a variety of measurments of 
Internet traffic and the majority found evidence 
of LRD or related multi-fractal behaviour. Sum- 



maries are given in ISahinoglu and Tekinavl ll99S 
Iwillinger. Paxson. Riedi. and Taqqij. l2Q03ll . The 
reason for the interest in the area is that LRD can, 
in some circumstances, negatively impact network 
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performance. The exact details of the scale and 
nature of the effect are uncertain and depend on the 
particular LRD process being considered. 

1.2 A Brief Introduction to Long-Range Depen- 
dence 

Let {X t : t £ N} be a time-series which is weakly 
stationary (that is it has a finite mean and the co- 
variance depends only on the separation or "lag" be- 
tween two points in the series). Let p(k) be the auto- 
correlation function (ACF) of X t . 

Definition 1 The ACF, p(k) for a weakly-stationary 
time series, {X t : t £ N} is given by 

E[(X t -p)(X t+k -^)] 
p{k) = , 

where E [X t ] is the expectation of X t , (i is the mean 
and a 2 is the variance. 

There are a number of different definitions of LRD in 
use in the literature. A commonly used definition is 
given below. 

Definition 2 The time-series Xt is said to be long- 
range dependent if J2T=-oo P(&) diverges. 

Often the specific functional form 

p(k) ~ C p k~ a , (1) 

is assumed where C p > and a £ (0, 1). Note that 
the symbol ~ is used here and throughout this paper 
to mean asymptotically equal to or f[x) ~ g(x) =>■ 
f(x)/g(x) = 1 as x — > oo or, where indicated, as 
x — > 0. The parameter a is related to the Hurst pa- 
rameter via the equation a = 2 — 2H. 

If Q holds then a similar definition can be shown 
to hold in the frequency domain. 

Definition 3 The spectral density /(A) of a function 
with ACF p(k) and variance a 2 can be defined as 

2 00 

Z7T * — ' 

k— — oo 

where A is the frequency, a 2 is the variance and i = 

Note that this definition of spectral density come s 
from the Wiener- Kninchine theorem IWeinerl [T9301. 

Definition 4 The weakly-stationary time-series X t is 
said to be long-range dependent if its spectral density 
obeys 

f(\)~c f \\\-e, 



as A — > 0, for some Ct > and some real [3 £ (0, 1). 

The parameter /3 is related to the Hurst parameter by 
H=(l + [3)/2. 

LRD relates to a number of other areas of 
statistics, notably the presence of statistical self- 
similarity. Self-similarity can be characterised by a 
self-similarity parameter H. If a self-similar process 
has stationary increments and H £ (1/2, 1) then its 
increments themselves, taken as a process, form an 
LRD process with Hurst parameter H. Indeed anal- 
ysis of telecommunications traffic is often described 
in terms of self-similarity and not long-range depen- 
dence. (Sometimes the phrase "asymptotic second- 
order self-similarity" is used. This refers to self- 
similarity in the data when it is aggregated and is syn- 
onymous with LRD.) 

In summary, LRD can be thought of in two ways. 
In the time domain it manifests as a high degree of 
correlation between distantly separated data points. 
In the frequency domain it manifests as a signifi- 
cant level of power at frequencies near zero. LRD 
is, in many ways, a difficult statistical property to 
work with. In the time-domain it is measured only 
at high lags (strictly at infinite lags) of the ACF — 
those very lags where only a few samples are avail- 
able and where the measurement errors are largest. 
In the frequency domain it is measured at frequen- 
cies near zero, again where it is hardest to make mea- 
surements. Time series with LRD converge slowly to 
their mean. While the Hurst parameter is perfectly 
well-defined mathematically, it will be shown that it 
is, in fact, a very difficult property to measure in real 
life. 

2 MEASURING THE HURST PARAMETER 

While the Hurst parameter is perfectly well- 
defined mathematically, measuring it is problematic. 
The data must be measured at high lags/low frequen- 
cies where fewer readings are available. Early es- 
timators were biased and converged only slowly as 
the amount of available data increased. All estima- 
tors are vulnerable to trends in the data, periodicity 
in the data and other sources of corruption. Many 
estimators assume specific functional forms for the 
underlying model and perform poorly if this is mis- 
specified. The techniques in this paper are chosen for 
a variety of reasons. The R/S parameter, aggregated 
variance and periodogram are well-known techniques 
which have been used for some time in measurements 
of the Hurst parameter. The local Whittle and wavelet 
techniques are newer techniques which generally fare 
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well in comparative studies. All the techniques cho- 
sen have freely available code which can be used with 
free software to estimate the Hurst parameter. 

The problems with real-life data are worse than 
those faced when measuring artificial data. Real life 
data is likely to have periodicity (due to, for exam- 
ple, daily usage patterns), trends and perhaps quan- 
tisation effects if readings are taken to a given pre- 
cision. The naive researcher taking a data set and 
running it through an off-the-shelf method for esti- 
mating the Hurst parameter is likely to end up with a 
misleading answer or possibly several different mis- 
leading answers. 

2.1 Data sets to be studied 

A large number of methods are used for 
generating data exhibiting LRD. A review of 
some of the better known methods are give n in 
I B ard et. Lang. Qppenheim. Phillipe. and Taaaut 
l2003bll . In this paper trial data sets with LRD 
and a known Hurst parameter are generated using 
fractional auto-regressive integrated moving average 
(FARIMA) modelling and fractional Gaussian noise 
(FGN). The software used to generate the data 
is included with at the web address previously 
mentioned. 

A FARIMA model is a well-known time series 
modelling technique. It is a modification of the stan- 
dard time series ARIMA (p, d, q) model. An ARIMA 
model is defined by 



9 



(l-^^)(l-B) d X i = (l-^ 

3=1 3 = 1 



where p is the order of the AR part of the model, the 
fa are the AR parameters, p is the order of the MA 
part of the model, the 6j are the MA parameters, d G 
Z is the order of differencing, the e% are i.i.d. noise 
(usually normally distributed with zero mean) and B 
is the backshift operator defined by H(X t ) = X t -\. 
If, instead of being an integer, the model is changed 
so that d G (0, 1/2) then the model is a FARIMA 
model. If the fa and di are chosen so that the model 
is stationary and d E (0, 1/2) then the model will 
be LRD with H = d + 1/2. FARIMA pro cesses 
were proposed by iGranger and Joveuxlfl9 801 and a 
description in the context of LRD can be found in 
lBeranLll994 pages 59-66]. 

Fractional Brownian Motion is a process Bjj{t) 
for t > obeying, 

• Bh(0) = almost surely, 

• Bfj (t) is a continuous function of t, 



• The distribution of Bjf (t) obeys 

¥[B H (t + k)- B H (t) <x 

X 

-u 2 \ 



(2n)-2k- 



-H 



exp 



2k 2H ) 



du, 



where H <G (1/2,1) is the Hurst parameter. The 
process Bh it) is known as fractional Brownian mo- 
tion (FBM) and its increments are known as frac- 
tional Gaussian noise (FGN). FBM is a self-similar 
process with self-similarity parameter H and, when 
H £ (1/2, 1), FGN exhibits long-range dependence 
with Hurst parameter H. When H = 1/2 in the 
above, then the process is the well known Weiner pro- 
cess (Brownian motion) and the increments are inde- 
pendent (Gaussian noise). A number of authors have 
described computationally efficient methods for gen- 
erating F GN and FBM . The one used in this paper is 
due to llPaxsonlll997ll . 

Data generated from these models will be tested 
using the various measurement techniques and then 
the same data set will be corrupted in several ways to 
see how this disrupts measurements: 

• Addition of zero mean AR(1) model with a 
high degree of short-range correlation (X t = 
0.9X t -i + £t). This simulates a process with 
very high local correlations which might be mis- 
taken for a long-range dependence. 

• Addition of periodic function (sine wave) — ten 
complete cycles of a sin wave are added to the 
signal. This simulates a seasonal effect in the 
data, for example, a daily usage pattern. 

• Addition of linear trend. This simulates growth 
in the data, for example the data might be a sam- 
ple of network traffic at a time of day when the 
network is growing busier as time continues. 

The noise signals are normalised so the standard devi- 
ation of the corrupting signal is identical to the stan- 
dard deviation of the original LRD signal to which it 
is being added. Note that strictly speaking, while the 
addition of an AR( 1) model does not change the LRD 
in the model and theoretically will leave the Hurst 
parameter unchanged, techincally the addition of a 
trend or of periodic noise makes the time-series non- 
stationary and hence the time-series produces are, 
strictly speaking, not really LRD. 

In addition, some real-life traffic traces are studied 
to provide insight into how well different measure- 
ments agree across data sets with and without various 
transforms being applied to clean the data. The data 
sets used are listed below. 
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• The famous (and muc h-studied) Bellcore 
data iLeland and Wilsoil 1 1 99 ill which was 
collected in 1989 and has been used for 
a large number of studies since. Note 
that, unfortunately, the exact traces u sed in 
iLeland. Taqqu. W illinger . and Wilsoni Il993ll 
are not available for download. Data from the 
same sites collected at a similar time is available 
online at: 

ita . ee . lbl . gov/ 
html/contrib/BC . html 

• A data set collected at the University of York in 
2001 which consists of a tcpdump trace of 67 
minutes of incoming and outgoing data from the 
external link to the university from the rest of the 
Internet. 

Three techniques (listed below) were tried to filter 
real-life traces in addition to making measurements 
purely on the raw data. These methods have been 
selected from the literature as techniques commonly 
used by researchers in the field. Often a high pass fil- 
ter would be used to remove periodicity and trends. 
However, since LRD measurements are most impor- 
tant at low-frequency, that is an obviously inappropri- 
ate technique. 

• Transform to log of original data (only appropri- 
ate if data is positive). 

• Removal of mean and linear trend (that is, sub- 
tract the best fit line Y = at + b for constant a 
and b). 

• Removal of high order best-fit polynomial of de- 
gree ten (the degree ten was chosen after higher 
degrees showed evidence of overfitting). 

Note that the "transform to log" option is not avail- 
able if the data contains zeros. In practice some rule 
of thumb could be considered for replacing zeros with 
a minimal value but this substitution was not done 
here and this pre-processing technique has not been 
used where the data contains zeros. 

2.2 Measurement techniques 

The measurement techniques used in this paper 
can only be described briefly but references to fuller 
descriptions with mathematical details are given. The 
techniques used here are chosen for various rea- 
sons. The R/S statistic, aggregated variance and peri- 
odogram are well-known techniques with a consid- 
erable history of use in estimating long-range de- 
pendence. The wavelet analysis technique and local 



Whittle estimator are newer techniques which per- 
form well in comparative studies and have strong the- 
oretical backing. 

The R/S statistic is a well-known technique for 
estimatin g the Hurst parameter. It is di scussed 
in [M andelbr ot and WallisL Il969ll and also iBeranl 
1 1994 pages 83-871. Let R(n) be the range of the 
data aggregated (by simple summation) over blocks 
of length n and S 2 (n) be the sample variance of 
the data aggregated at the same scale. For FGN or 
FARIMA series the ratio R/S(n) follows 

E[R/S(n)} ~ C H n H , 

where Ch is a positive, finite constant independent of 
n. Hence a log-log plot of R/S(n) versus n should 
have a constant slope as n becomes large. A problem 
with this technique which is common to many Hurst 
parameter estimators is knowing which values of n to 
consider. For small n short term correlations domi- 
nate and the readings are not valid. For large n then 
there are few samples and the value of R/S(n) will 
not be accurate. Similar problems occur for most of 
the estimators described here. 



in 



The aggregat ed variance technique is described 
IBeraJ ll994t page 92]. It considers var (X ( - m '>) 



where x[ m%> is a time series derrived from X t by 
aggregating it over blocks of size to. The sample 
variance var (X^™)) should be asymptotically pro- 
portional to m 2H ~ 2 for large N/m and to. 

The periodogram, described 

iCeweke and Porter-Hudakl fl98l is 
by 



by 
defined 



7(A) = 



1 



2nN 



jv 

E 



X 3 e 



ijX 



where A is the frequency. For a series with finite vari- 
ance, J(A) is an estimate of the spectral density of the 
series. From Definition|4]then, a log-log plot of 1(A) 
should have a slope of 1 — 2H close to the origin. 

Whittle's estimator is a Maxmimum Likelihood 
Estimator which assumes a functional form for /(A) 
and seeks to minimise parameters based upon this as- 
sumption. A slight issue with the Whittle estimator 
is that the user must specify the functional form ex- 
pected, typically either FGN or FARIMA (with the 
order specified). If the user misspecifies the underly- 
ing model then errors may occur. Local Whittle is a 
semi-parametric version of this which only assumes a 
functional form for the spec tral density at frequencies 
near zero | Robinson, 1995]. 

Wavelet analysis has been used with success both 
to measure the Hurst parameter and also to simulate 
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data lRiedill2Q03ll . Wavelets can be thought of as akin 
to Fourier series but using waveforms other than sine 
waves. The estimator used here fits a straight line to 
a frequency spectrum derived using wavelets. A 95% 
confidence interval is given, however, this should be 
interpreted only as a confidence interval on the fitted 
line and, as will be seen, not as a confidence inte- 
val on the fitted Hurst parameter. This is an impor- 
tant distinction — it is tempting to consider the con- 
fidence intervals given by some estimators as literally 
confidence intervals on the measurement of H . Often 
this is not the case (as in the case of Wavelet analysis) 
or is only the case if certain conditions are met. 

3 RESULTS 

Results here are in two sections. Firstly, results 
are given for simulated data. In these cases the ex- 
pected "correct" answer is known and therefore it can 
be seen how well the estimators have performed. The 
data is then corrupted by the addition of noise with 
the same standard deviation as the original data sets. 
Three types of noise are considered as described pre- 
viously. 

In the second section results are given for real 
data. The York data is analysed as a time series of 
bytes per unit time for two different time units. The 
Bellcore data is analysed both in terms of interarrival 
times and in terms of bytes per unit time. Note that, 
strictly speaking, the interarrival times do not consi- 
titute a proper "time-series" since the time units be- 
tween readings are not constant. 

3.1 Results on Simulated Data 

For each of the simulation methods chosen, traces 
have been generated. Each trace is 100,000 points 
of data. Hurst parameters of 0.7 and 0.9 have been 
chosen to represent a low and a high level of long- 
range dependence in data. The errors on the wavelet 
estimator are a 95% confidence interval on the fitted 
regression line (not, as might be thought, the Hurst 
parameter measured). 

Table ^ shows results for various FGN models. 
Three runs each are done with a Hurst parameter of 
0.7 and then 0.9. Firstly it should be noted that, in 
all cases, for H=0.7 all estimators are relatively close 
when no noise is applied. The R/S method performs 
worst, as it consistently underestimates the Hurst pa- 
rameter. The addition of AR(1) noise confuses all 
the methods with the Local Whittle performing par- 
ticularly poorly. The correct answer is well outside 
the confidence intervals of the Wavelet estimate after 
this addition (although, as previously stated, the con- 



fidence interval should not be taken literally). Ad- 
dition of a sine wave or a trend causes trouble for 
the aggregated variance method but the frequency do- 
main methods (wavelets and local Whittle) do not 
seem greatly affected. 

When considering runs with Hurst parameter 
H=0.9, the R/S method gets a considerable under- 
estimate even with no corrupting noise. Note also 
that the R/S and aggregated variance method actually 
produce quite different estimates for the three runs. 
Most methods seem to perform badly with the AR(1) 
noise corruption. Again the frequency domain meth- 
ods seem to be able to cope with the sine wave and 
with the addition of a trend. The aggregated vari- 
ance method seems to perform particularly badly in 
the presence of a corrupting sin wave and a corrupt- 
ing trend (perhaps not surprising as such a series is 
no longer weakly stationary). 

Table [2] shows a variety of results for FARIMA 
models. The first three runs are for a FARIMA 
(0, d, 0) model (that is one with no AR or MA com- 
ponents) and with a Hurst parameter H = 0.7. In this 
case, all methods peform adequately with no noise 
(although the R/S plot perhaps underestimates the an- 
swer). Addition of AR(1) noise causes problems for 
the R/S plot, wavelet and local Whittle methods and 
to a lesser extent the periodogram. The addition of a 
sin wave and a trend causes problems for the aggre- 
gated variance. 

For a FARIMA (1, d, 1) model with H = 0.7 and 
with the AR parameter </>i = 0.5 and the MA pa- 
rameter 9i — 0.5 (implying a moderate degree of 
short range correlation) all estimators provide a rea- 
sonable result for the uncorrupted series. As before, 
the wavelet and local Whittle method seem relatively 
robust to the addition of a trend. The AR(1) noise 
again causes problems for most of the methods. 

For a FARIMA (0, d, 0) model with H = 0.9 
the R/S method under predicts the Hurst parameter 
but all others perform well in the absence of noise. 
The AR(1) noise causes problems for the local Whit- 
tle and wavelet methods and the sine wave and trend 
cause problems for the aggregated variance. 

For a FARIMA (1, d, 1) model with H = 0.9 and 
with the AR parameter <f>x — 0.5 and the MA pa- 
rameter 9i — 0.5 (implying, as before, a moderate 
degree of short range correlation) all estimators do 
relatively well initially. The corruption produces the 
same problems with the same estimators — that is to 
say, wavelets and local Whittle do not cope with the 
AR(1) noise and Aggregated variance reacts badly to 
the sine wave and local trend. 

For a FARIMA (2, d, 1) model with H = 0.9 and 
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Added 


R/S Plot 


Aggreg. 


Period. 


Wavelet 


Local 


Noise 




Variance 


ogram 


Estimate 


Whittle 




100,000 points FGN — H= 


0.7 — run one. 




None 


0.66 


0.668 


0.686 


0.707 ±0.013 


0.72 


AR(1) 


0.767 


0.657 


0.794 


0.888 ± 0.034 


0.904 


Sin 


0.667 


0.969 


0.692 


0.707 ±0.013 


0.787 


Trend 


0.66 


0.968 


0.777 


0.707 ±0.013 


0.766 




100,000 points FGN — H= 


0.7 — run two. 




None 


0.641 


0.692 


0.7 


0.694 ± 0.007 


0.721 


AR(1) 


0.775 


0.671 


0.795 


0.882 ±0.036 


0.902 


Sin 


0.66 


0.97 


0.705 


0.694 ± 0.007 


0.788 


Trend 


0.641 


0.968 


0.769 


0.694 ± 0.007 


0.765 




100,000 points FGN 


— H= 


0.7 — run three. 




None 


0.636 


0.69 


0.704 


0.708 ± 0.009 


0.723 


AR(1) 


0.734 


0.654 


0.79 


0.876 ± 0.038 


0.905 


Sin 


0.64 


0.969 


0.709 


0.708 ± 0.009 


0.787 


Trend 


0.636 


0.971 


0.783 


0.708 ± 0.009 


0.77 




100,000 points FGN — H= 


0.9 — run one. 




None 


0.782 


0.864 


0.905 


0.901 ± 0.009 


0.934 


AR(1) 


0.805 


0.784 


0.88 


0.969 ± 0.042 


1.066 


Sin 


0.772 


0.961 


0.907 


0.901 ± 0.009 


0.945 


Trend 


0.782 


0.958 


0.928 


0.901 ± 0.009 


0.939 




100,000 points FGN — H= 


0.9 — run two. 




None 


0.862 


0.837 


0.891 


0.902 ± 0.003 


0.933 


AR(1) 


0.856 


0.76 


0.877 


0.969 ± 0.038 


1.062 


Sin 


0.858 


0.955 


0.894 


0.902 ± 0.003 


0.943 


Trend 


0.862 


0.954 


0.921 


0.902 ± 0.003 


0.938 




100,000 points FGN — H= 


0.9 — run two. 




None 


0.793 


0.884 


0.907 


0.904 ± 0.007 


0.93 


AR(1) 


0.818 


0.802 


0.871 


0.972 ± 0.041 


1.066 


Sin 


0.8 


0.967 


0.91 


0.904 ± 0.007 


0.943 


Trend 


0.794 


0.959 


0.924 


0.904 ± 0.007 


0.936 



Table 1: Results for Fractional Gaussian Noise models plus various forms of noise. 
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Added 


R/S Plot 


Aggreg. 


Period. 


Wavelet 


Local 


Noise 




Variance 


ogram 


Estimate 


Whittle 




100,000 points FARIMA (0,d,0) 


— H = 0.7 — run one. 


None 


0.663 


0.692 


0.699 


0.696 ± 0.004 


0.681 


AR(1) 


0.823 


0.673 


0.792 


0.896 ± 0.033 


0.876 


Sin 


0.665 


0.972 


0.704 


0.696 ± 0.004 


0.765 


Trend 


0.662 


0.973 


0.786 


0.696 ± 0.004 


0.746 




100,000 points FARIMA (0,d,0) 


— H= 0.7 — run two. 


None 


0.706 


0.701 


0.71 


0.702 ± 0.007 


0.679 


AR(1) 


0.837 


0.673 


0.791 


0.891 ±0.034 


0.873 


Sin 


0.714 


0.972 


0.714 


0.702 ± 0.007 


0.764 


Trend 


0.706 


0.972 


0.782 


0.702 ± 0.007 


0.742 




100,000 points FARIMA (0,d,0) 


— H= 0.7 — run three. 


None 


0.718 


0.684 


0.696 


0.687 ± 0.005 


0.679 


AR(1) 


0.827 


0.667 


0.776 


0.868 ± 0.044 


0.872 


Sin 


0.723 


0.973 


0.701 


0.687 ± 0.005 


0.765 


Trend 


0.718 


0.972 


0.778 


0.687 ± 0.005 


0.743 


100,000 points FARIMA(l,d,l) — H= 0.7, fa = 0.5, 


0i = 0.5. 


None 


0.684 


0.693 


0.706 


0.697 ± 0.006 


0.68 


AR(1) 


0.818 


0.656 


0.774 


0.88 ± 0.041 


0.878 


Sin 


0.689 


0.973 


0.71 


0.697 ± 0.006 


0.766 


Trend 


0.684 


0.972 


0.786 


0.697 ± 0.006 


0.743 


100,000 points FARIMA (0,d,0) — H = 0.9. 


None 


0.757 


0.882 


0.91 


0.886 ± 0.004 


0.861 


AR(1) 


0.804 


0.789 


0.873 


0.969 ± 0.036 


1.011 


Sin 


0.764 


0.967 


0.913 


0.886 ± 0.004 


0.883 


Trend 


0.757 


0.974 


0.933 


0.886 ± 0.004 


0.875 


100,000 points FARIMA (l,d,l) — H= 0.9, fa = 0.5, 


0i = 0.5. 


None 


0.856 


0.854 


0.881 


0.887 ± 0.006 


0.858 


AR(1) 


0.888 


0.773 


0.874 


0.959 ± 0.04 


1.001 


Sin 


0.86 


0.963 


0.885 


0.887 ± 0.006 


0.879 


Trend 


0.856 


0.968 


0.92 


0.887 ± 0.006 


0.872 


100,000 points FARIMA (2,d,l) 


— H= 0.9, fa = 0.5, fa = 


O.2,0i = 0.1. 


None 


0.807 


0.74 


0.817 


0.966 ± 0.048 


1.05 


AR(1) 


0.814 


0.691 


0.822 


1.007 ±0.059 


1.136 


Sin 


0.8 


0.94 


0.821 


0.966 ± 0.048 


1.052 


Trend 


0.807 


0.939 


0.856 


0.966 ± 0.048 


1.051 



Table 2: Results for various FARIMA models corrupted by several forms of noise. 
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with the AR parameters 4>i = 0.5, fa = 0.2 and 
the MA parameter Q\ — 0.1 indicating quite strong 
short-range correlations, none of the estimators per- 
form particularly well. Presented with these results, a 
researcher would certainly not know the Hurst param- 
eter of the underlying model from looking at the re- 
sults given by the estimators. In the case of the AR(1) 
corrupted data the measurement from the Wavelet es- 
timator is outside of the usual range for the Hurst pa- 
rameter. In fact it is not unusual for Hurst parame- 
ter estimators to produce estimates outside the range 
(1/2, 1). All five estimators are producing different 
results in most cases (there is some aggreement be- 
tween the R/S plot and periodogram but it would be 
hard to put this down to anything more than coinci- 
dence and, in any case, they are agreeing on an in- 
correct value for the Hurst parameter). It is interest- 
ing that, even in this relatively simple case where the 
theoretical correct result is known, five well-known 
estimators of the Hurst parameter all fail to get the 
correct answer. 

3.2 Autocorrelations for the Artificial Data 

It's instructive to look at the ACF of these data 
sets to understand why the various methods fail or 
succeed with the data sets. Figure ^shows the ACF 
up to lag 1000 for a data set of 100,000 points of FGN 
data with H = 0.7. For this data, it is possible to 
fit "by eye" a straight line to the log-log plot of the 
ACF and obtain an estimate of the Hurst parameter. 
From Table it can be seen that all the estimators 
performed well on this data set. Note also that in the 
log-log plot it can be seen that at the higher lags the 
error on the ACF estimate is large. 

When the time series is corrupted by the addi- 
tion of AR(1) noise as described earlier in the paper 
then the ACF changes markedly. The ACF is then 
shown in Figure [2] The degree to which the ACF has 
changed is only really clear in the log-log plot. It 
can be seen that, for low lags, the ACF remains much 
higher than in the noise-free data of the previous se- 
ries. It would be difficult indeed to make a convinc- 
ing case for fitting a straight line to this data. As for 
the higher lags, the ACF estimate certainly does not 
seem to produce anything like a straight line in the 
log-log plot for lags over fifty. Two things can be 
clearly seen from this picture, firstly that it is impos- 
sible to get a good estimate of LRD simply by fitting 
a straight line to the ACF and secondly that the addi- 
tion of highly correlated short range dependent data 
can vastly change the nature of the estimation prob- 
lem. From considering this ACF it may be no surprise 



that the estimators mainly performed so badly at re- 
moving AR(1) noise. 

Finally, the ACF is shown for the a data set which 
is FARIMA (2, d, 1) with H = 0.7, fa = 0.5, fa = 
0.2, Q\ = 0.1. This was the data which proved hard- 
est to estimate in Table |2] Some of the difficulties of 
this estimation can be seen by looking at the ACF in 
Figure|5] Even before the addition of noise, it can be 
seen that this data looks as hard to find a single best 
fit line as it was in Figure|2] It is, again, unsurprising, 
that the estimators performed badly with this data set 
even without the addition of noise. 

3.3 Results on Real Data 

In analysing the real data it is hard to know where 
to begin. Since the genuine answer (if, indeed, it can 
be really said that there is a genuine answer) is not 
known it cannot be said that one result is more "right" 
than another. The suggested methods for preprocess- 
ing data (taking logs, removing a linear trend and re- 
moving a best fit polynomial — in this case of order 
ten) have all been found in the literature on measuring 
the Hurst parameter. 

Table|3]shows analysis of data collected at the Uni- 
versity of York. The same data set is analysed firstly 
as a series of bytes/second and then as bytes/tenth of 
a second. While theoretically the results should be 
the same, in practice this is not the case. Obviously 
there are only one tenth as many points in the data 
set when seconds are used rather than tenths of sec- 
onds. Firstly, looking at the data aggregated over a 
time period of one second, there is no good agree- 
ment between estimators. The periodogram estimate 
is hopelessly out of the correct range. The other esti- 
mators, while in the range (1/2,1) show no particular 
agreement. Of the suggested filtering techniques, lit- 
tle changes between them except that removal of a 
polynomial greatly reduces the estimate found by the 
periodogram and slightly reduces the estimate found 
by aggregated variance. No conclusion can realisti- 
cally be drawn about the data from these results. 

Considering the data aggregated into tenths of a 
second time units the picture is somewhat clearer. 
Taking a log of data was impossible at this time scale 
due to presence of zeros. The estimators, with the 
exception of the R/S plot are all relatively near H = 
0.9. While it seems somewhat arbitrary to ignore the 
results of the R/S plot it should be remembered that 
this technique performed poorly with high Hurst pa- 
rameter measurements on theoretical data and under- 
estimated badly in those cases. No great difference 
is observed from any of the suggested filtering tech- 
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Figure 1: ACF (left) and log-log ACF (right) for FGN with Hurst parameter H = 0.7. 
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Figure 2: ACF (left) and log-log ACF (right) for FGN with Hurst parameter H = 0.7 corrupted by AR(1) noise 
with the same variance. 



Filter 


R/S Plot 


Aggreg. 


Period. 


Wavelet 


Local 


Type 




Variance 


ogram 


Estimate 


Whittle 




York trace (bytes/second) - 


- 4047 points 




None 


0.749 


0.88 


1.186 


0.912 ±0.052 


0.981 


Log 


0.758 


0.894 


1.105 


0.921 ± 0.039 


0.932 


Trend 


0.749 


0.873 


1.212 


0.912 ±0.052 


0.981 


Poly 


0.756 


0.723 


0.732 


0.895 ± 0.04 


0.972 




York trace (bytes/tenth) — 


40467 points 




None 


0.826 


0.924 


0.928 


0.909 ± 0.012 


0.881 


Trend 


0.826 


0.923 


0.932 


0.909 ± 0.012 


0.881 


Poly 


0.827 


0.892 


0.863 


0.909 ± 0.012 


0.878 



Table 3: Analysis of bytes/unit time data collected at the University of York. 



ISSN 1473-804x online, 1473-803 1 print 11 ISSN 1473-804x online, 1473-803 1 print 



400 600 
lag, k 



lag, k 



Figure 3: ACF (left) and log-log ACF (right) for FARIMA (2, d,l) — H = 0.9, fa = 0.5, fa = 0.2, 9 l = 0.1. 



niques except, perhaps, a slight reduction in the ag- 
gregated variance and periodogram results from re- 
moval of a polynomial. A tentative conclusion from 
this data would be that 0.85 < H < 0.95 and that the 
R/S plot is inaccurate for this trace. 

In the case of the Bellcore measurements, the data 
has been split into two sections and analysed seper- 
ately for interarrival times and for bytes per unit time. 
Considering first the interarrival times, all estimators 
seem to have a result which is not too distant from 
H = 0.7 in both cases. The various filtering tech- 
niques tried do little to change this. It is hard to 
come to a really robust conclusion since the estima- 
tors are as high as 0.806 (aggregated variance after 
taking logs) and as low as 0.652 (local Whittle after 
taking logs). 

When the bytes per unit time are considered, the 
log technique cannot be used due to zeros in the 
data. The most comfortable conclusion abou this 
data might be that the Hurst parameter is somewhere 
around H = 0.8 with the R/S plot underestimating 
again. As before, it is hard to reach a strong con- 
clusion on the exact Hurst parameter. Certainly it 
would be foolish to take the confidence intervals on 
the wavelet estimator at face value. The various fil- 
ters tried seem to make little difference except per- 
haps a slight reduction in the answer given by some 
estimators after the polynomial is removed. A tenta- 
tive conclusion might be that 0.75 < H < 0.85 for 
this data with the R/S plot being in error. 



4 CONCLUSION 

This paper has looked at measuring the Hurst pa- 
rameter, firstly in the case of artificial data contami- 
nated by various types of noise and secondly in the 
case of real data with various filters to try to improve 
the performance of the estimators used. 

The most striking conclusion of this paper is that 
measuring the Hurst parameter, even in artificial data, 
is very hit and miss. In the artificial data with no cor- 
rupting noise, some estimators performed very poorly 
indeed. Confidence intervals given should certainly 
not be taken at face value (indeed should be consid- 
ered as next to worthless). 

Corrupting noise can affect the measurements 
badly and different estimators are affected in by dif- 
ferent types of noise. In particular, frequency domain 
estimators (as might be expected) are robust to the 
addition of sinusoidal noise or a trend. All estima- 
tors had problems in some circumstances with the ad- 
dition of a heavy degree of short-range dependence 
even though this, in theory, does not change the long- 
range dependence of the time series. 

When considering real data, researchers are ad- 
vised to use extreme caution. A researcher relying on 
the results of any single estimator for the Hurst pa- 
rameter is likely to be drawing false conclusions, no 
matter how sound the theoretical backing for the esti- 
mator in question. While simple filtering techniques 
are suggested in the literature for improving the per- 
formance of Hurst parameter estimation, they had lit- 
tle or no effect on the data analysed in this paper. 

All the data and tools used in this paper are avail- 
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Filter 


R/S Plot 


Aggreg. Period. 


Wavelet 


Local 


Type 




Variance ogram 


Estimate 


Whittle 


Bellcore data BC-Aug89 (interarrival times) — first 360,000 points. 


None 


0.73 


0.742 0.762 


0.73 ±0.018 


0.661 


Log 


0.722 


0.806 0.797 


0.77 ± 0.02 


0.652 


Trend 


0.73 


0.74 0.762 


0.73 ±0.018 


0.661 


Poly 


0.73 


0.733 0.751 


0.73 ±0.018 


0.66 


Bellcore data BC-Aug89 (interarrival times) — second 360,000 points. 


None 


0.709 


0.703 0.742 


0.746 ± 0.025 


0.655 


Log 


0.721 


0.795 0.779 


0.778 ±0.011 


0.673 


Trend 


0.709 


0.703 0.742 


0.746 ± 0.025 


0.655 


Poly 


0.709 


0.691 0.732 


0.746 ± 0.025 


0.654 


Bellcore data BC-Aug89 (bytes/lOms) — first 1000 sees. 


None 


0.707 


0.8 0.817 


0.786 ±0.017 


0.822 


Trend 


0.707 


0.797 0.815 


0.786 ± 0.017 


0.822 


Poly 


0.707 


0.789 0.787 


0.786 ±0.017 


0.822 


Bellcore data BC-Aug89 (bytes/1 0ms) — second 1000 sees. 


None 


0.62 


0.802 0.808 


0.762 ± 0.012 


0.825 


Trend 


0.62 


0.802 0.808 


0.762 ± 0.012 


0.825 


Poly 


0.618 


0.786 0.777 


0.762 ± 0.012 


0.824 



Table 4: Analysis of bytes/unit time and interarrival times for the Bellcore data with various methods to attempt 
to remove non-stationary components. 



able for download from the web and can be found at: 

www . richardclegg .org/ 
lrdsources/ software/ 
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